Here we combine the essential elements of the four vignettes
to repeat the analyses with an alternative value for the minimum body size (100 g, as opposed to the original 4 g), as suggested by a reviewer (see Supplementary Material). We retain only the necessary details of the four vignettes; the four vignettes should be used for initial understanding, this one is to repeat the analysis with alternate assumptions (or data). We simplify much of the text here as well, to focus on the code.
This is simplified from MEPS_IBTS_1.html – see that vignette for full explanations.
Use the saved original data set:
dim(dataOrig)
#> [1] 178435 13
names(dataOrig)
#> [1] "AphiaID" "Survey" "Year"
#> [4] "Quarter" "Area" "Species"
#> [7] "LngtClas" "CPUE_number_per_hour" "Taxonomic.group"
#> [10] "a" "b" "weight_g"
#> [13] "CPUE_bio_per_hour"
Keep the desired columns, noting that data
will change a lot in the following code and will end up in a standard format:
numAreas = length(unique(dataOrig$Area))
colsKeep = c("Year",
"AphiaID",
"LngtClas",
"CPUE_number_per_hour",
"a",
"b",
"weight_g",
"CPUE_bio_per_hour")
colsDiscard = setdiff(names(dataOrig), colsKeep)
data = sizeSpectra::s_select(dataOrig, colsKeep) # uses Sebastian Kranz's s_dplyr_funcs.r
if(sum( colsKeep != c("Year", "AphiaID", "LngtClas", "CPUE_number_per_hour",
"a", "b", "weight_g", "CPUE_bio_per_hour")) > 0)
{ stop("Need to adjust renaming") }
names(data) = c("Year", "SpecCode", "LngtClass", "Number", "LWa", "LWb",
"bodyMass", "CPUE_bio_per_hour")
# CPUE_bio_per_hour is Number * bodyMass
Aggregate multiple counts of the same length fish of the same species in the same year:
data = dplyr::summarise(dplyr::group_by(data,
Year,
SpecCode,
LngtClass),
"Number" = sum(Number)/numAreas,
"LWa" = unique(LWa),
"LWb" = unique(LWb),
"bodyMass" = unique(bodyMass))
Number
is the average number (of each species and length) caught per hour of trawling across all seven areas.
Up until here data
is the same as it was in MEPS_IBTS_1.html.
Now, instead of only including body-mass classes above 4 g (which we did following Blanchard et al., 2005), we set the cut-off to be 100 g to see if the larger fish are fit well with the bounded power-law fit only to the larger fish:
range(data$LngtClass)
#> [1] 0 150
range(data$bodyMass)
#> [1] 0.00 35630.04
sum(data$bodyMass == 0)
#> [1] 2549
sum(data$bodyMass < 100 )
#> [1] 21781
data = dplyr::filter(data, bodyMass >= 100)
range(data$bodyMass)
#> [1] 100.00 35630.04
data
#> # A tibble: 27,410 x 7
#> # Groups: Year, SpecCode [1,423]
#> Year SpecCode LngtClass Number LWa LWb bodyMass
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1986 105814 45 0.00714 0.0031 3.03 315.
#> 2 1986 105814 46 0.00714 0.0031 3.03 337.
#> 3 1986 105814 50 0.00714 0.0031 3.03 434.
#> 4 1986 105814 52 0.0293 0.0031 3.03 489.
#> 5 1986 105814 53 0.0109 0.0031 3.03 518.
#> 6 1986 105814 54 0.0113 0.0031 3.03 548.
#> 7 1986 105814 56 0.0218 0.0031 3.03 612.
#> 8 1986 105814 57 0.0188 0.0031 3.03 646.
#> 9 1986 105814 58 0.0381 0.0031 3.03 680.
#> 10 1986 105814 59 0.0327 0.0031 3.03 717.
#> # ... with 27,400 more rows
summary(data)
#> Year SpecCode LngtClass Number
#> Min. :1986 Min. :101170 Min. : 9.00 Min. : 0.00058
#> 1st Qu.:1993 1st Qu.:126436 1st Qu.: 32.00 1st Qu.: 0.00866
#> Median :2001 Median :126441 Median : 42.00 Median : 0.02597
#> Mean :2001 Mean :122182 Mean : 47.96 Mean : 1.18971
#> 3rd Qu.:2009 3rd Qu.:127138 3rd Qu.: 60.00 3rd Qu.: 0.11626
#> Max. :2015 Max. :271564 Max. :150.00 Max. :265.51052
#> LWa LWb bodyMass
#> Min. :0.000300 Min. :1.797 Min. : 100.0
#> 1st Qu.:0.003300 1st Qu.:3.095 1st Qu.: 264.6
#> Median :0.004200 Median :3.175 Median : 609.5
#> Mean :0.008075 Mean :3.155 Mean : 1629.8
#> 3rd Qu.:0.005600 3rd Qu.:3.243 3rd Qu.: 1581.0
#> Max. :0.235000 Max. :3.475 Max. :35630.0
Total number of fish in this dataset is
The unique length classes are:
sort(unique(data$LngtClass))
#> [1] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [18] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
#> [35] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
#> [52] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
#> [69] 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
#> [86] 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
#> [103] 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
#> [120] 128 129 130 131 132 133 134 135 138 139 140 142 144 145 146 149 150
Need this to stop earlier groups being kept (can mess up later code):
The equivalent Table 1 (first six and last six rows of the data) is now as follows, where the last six rows of the original Table 1 are no longer in the data as they had body masses $<$100 g:
data_biomass <- dplyr::mutate(data,
Biomass = Number * bodyMass)
knitr::kable(rbind(data_biomass[1:6,],
data_biomass[(nrow(data_biomass)-5):nrow(data_biomass),
]),
digits=c(0, 0, 0, 3, 4, 4, 2, 2))
Year | SpecCode | LngtClass | Number | LWa | LWb | bodyMass | Biomass |
---|---|---|---|---|---|---|---|
1986 | 105814 | 45 | 0.007 | 0.0031 | 3.029 | 315.46 | 2.25 |
1986 | 105814 | 46 | 0.007 | 0.0031 | 3.029 | 337.17 | 2.41 |
1986 | 105814 | 50 | 0.007 | 0.0031 | 3.029 | 434.05 | 3.10 |
1986 | 105814 | 52 | 0.029 | 0.0031 | 3.029 | 488.81 | 14.33 |
1986 | 105814 | 53 | 0.011 | 0.0031 | 3.029 | 517.84 | 5.65 |
1986 | 105814 | 54 | 0.011 | 0.0031 | 3.029 | 548.00 | 6.18 |
2015 | 150637 | 39 | 0.234 | 0.0034 | 3.260 | 522.82 | 122.17 |
2015 | 150637 | 40 | 0.103 | 0.0034 | 3.260 | 567.80 | 58.60 |
2015 | 150637 | 41 | 0.059 | 0.0034 | 3.260 | 615.40 | 36.01 |
2015 | 150637 | 42 | 0.007 | 0.0034 | 3.260 | 665.69 | 4.42 |
2015 | 150637 | 43 | 0.017 | 0.0034 | 3.260 | 718.76 | 11.94 |
2015 | 150637 | 44 | 0.010 | 0.0034 | 3.260 | 774.70 | 7.72 |
This is simplified from MEPS_IBTS_2.html – see that vignette for full explanations.
See if the number of length classes or species seems to change over time:
dataSumm = dplyr::summarise(dplyr::group_by(data, Year),
uniqLngtClass = length(unique(LngtClass)),
uniqSpec = length(unique(SpecCode)))
par(mfrow=c(2,1)) #7,1))
plot(dataSumm$Year, dataSumm$uniqLngtClass, xlab="Year",
ylab="No. unique length classes", type="o",
ylim=c(0, max(dataSumm$uniqLngtClass)))
plot(dataSumm$Year, dataSumm$uniqSpec, xlab="Year",
ylab="No. unique species", type="o", ylim=c(0, max(dataSumm$uniqSpec)))
As for the original full data set, there do not look to be any serious issue with this (no drastic changes in, for example, species identification through time).
The remaining eightMethods.count()
and plotting code from MEPS_IBTS_2.html is not repeated here since it takes a few hours to run, and our focus is on repeating the MLEbins method.
This section analyses the IBTS data using the MLEbins method, using the essential calculations from MEPS_IBTS_MLEbins.html.
LngtClass
for all species is the minimum value of a 1-cm-width bin, except for herring (Clupea harengus) and sprat (Sprattus sprattus) for which lengths are rounded down to 0.5 cm values (so the bins are 0.5-cm wide). The SpecCode
values for these are:
herringCode = dplyr::filter(specCodeNames, species == "Clupea harengus")$speccode
herringCode
#> [1] 126417
spratCode = dplyr::filter(specCodeNames, species == "Sprattus sprattus")$speccode
spratCode
#> [1] 126425
specCode05 = c(herringCode, spratCode) # species codes with 0.5cm length bins
Verified earlier that only these two species have 0.5 cm values for LngtClass
.
So LngtClass
is the minimum of each length bin. Need to work out the maximum of each length bin LengthMax
, and then use the species-specific length-weight relationships to give the min (wmin
) and max (wmax
) of each body-mass bin. So create dataBin
table dataframe that has LengthMax
, wmin
and wmax
as extra columns for each row:
dataBin = dplyr::mutate(data,
LngtMax = LngtClass + 1)
aa = which(dataBin$SpecCode %in% specCode05) # row numbers for herring, sprat
dataBin[aa, "LngtMax"] = dataBin[aa, "LngtMax"] - 0.5 # subtract 0.5 cm to
# give 0.5-cm wide bins
unique(dataBin$LngtMax - dataBin$LngtClass) # correctly just has 0.5 and 1
#> [1] 1
unique( dplyr::filter(dataBin, LngtMax - LngtClass == 0.5)$SpecCode) # just herring,sprat
#> integer(0)
dataBin = dplyr::mutate(dataBin, wmax = LWa * LngtMax^LWb) # calculate max body mass
# for each bin (min
# is currently bodyMass)
dataBin = dplyr::rename(dataBin, LngtMin = LngtClass) # For consistency
dataBin = dplyr::rename(dataBin, wmin = bodyMass)
dataBin = dataBin[ , c("Year", "SpecCode", "LngtMin", "LngtMax",
"LWa", "LWb", "wmin", "wmax", "Number")] # Reorder columns
range(dplyr::mutate(dataBin,
wminCheck = LWa * LngtMin^LWb)$wminCheck - dataBin$wmin)
#> [1] -2.273737e-13 2.273737e-13
# Verifying that wmin is correct
# (was calculated independently)
length(unique(dataBin$SpecCode))
#> [1] 86
No need to re-plot the body-mass bins for each species, since they won’t have changed from the MEPS paper (there will just be no bins below 100 g, and so some species will be omitted).
Now use the MLEbins method to fit each year of data in turn.
fullYears = sort(unique(dataBin$Year))
# Do a loop for each year, saving all the results in MLEbins.nSeaFung.new
for(iii in 1:length(fullYears))
{
dataBinForLike = dplyr::filter(dataBin,
Year == fullYears[iii])
dataBinForLike = dplyr::select(dataBinForLike,
SpecCode,
wmin,
wmax,
Number)
n = sum(dataBinForLike$Number)
xmin = min(dataBinForLike$wmin)
xmax = max(dataBinForLike$wmax)
MLEbins.nSeaFung.oneyear.new = calcLike(negLL.fn = negLL.PLB.bins.species,
p = -1.9,
suppress.warnings = TRUE,
dataBinForLike = dataBinForLike,
n = n,
xmin = xmin,
xmax = xmax)
if(iii == 1)
{
MLEbins.nSeaFung.new = data.frame(Year = fullYears[iii],
xmin=xmin,
xmax=xmax,
n=n,
b=MLEbins.nSeaFung.oneyear.new$MLE,
confMin=MLEbins.nSeaFung.oneyear.new$conf[1],
confMax=MLEbins.nSeaFung.oneyear.new$conf[2])
} else {
MLEbins.nSeaFung.new = rbind(MLEbins.nSeaFung.new,
c(fullYears[iii],
xmin,
xmax,
n,
MLEbins.nSeaFung.oneyear.new$MLE,
MLEbins.nSeaFung.oneyear.new$conf[1],
MLEbins.nSeaFung.oneyear.new$conf[2]))
}
}
# Need the standard error for weighted linear regression,
# see eightMethods.count() for details:
MLEbins.nSeaFung.new = dplyr::tbl_df(MLEbins.nSeaFung.new)
MLEbins.nSeaFung.new = dplyr::mutate(MLEbins.nSeaFung.new,
stdErr = (abs(confMin-b) +
abs(confMax-b))/(2*1.96) )
MLEbins.nSeaFung.new
#> # A tibble: 30 x 8
#> Year xmin xmax n b confMin confMax stdErr
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1986 101. 28723. 917. -2.34 -2.43 -2.26 0.0449
#> 2 1987 100 25974. 849. -2.48 -2.58 -2.38 0.0510
#> 3 1988 102. 29440. 983. -2.83 -2.95 -2.72 0.0584
#> 4 1989 102. 35211. 1058. -2.69 -2.79 -2.59 0.0520
#> 5 1990 102. 34811. 830. -2.66 -2.77 -2.55 0.0574
#> 6 1991 100 26413. 1076. -2.82 -2.93 -2.72 0.0556
#> 7 1992 100 25317. 1102. -2.82 -2.93 -2.71 0.0546
#> 8 1993 100. 29440. 1122. -2.47 -2.56 -2.38 0.0441
#> 9 1994 102. 22201. 973. -2.97 -3.10 -2.85 0.0633
#> 10 1995 102. 31819. 937. -2.67 -2.78 -2.57 0.0546
#> # ... with 20 more rows
Now to plot the results and obtain the regression fit (uncomment the commented lines to save an explicit file):
# postscript("../IBTS-min-100/trends100.eps",
# height = 6, width = 7.5, horizontal = FALSE, paper = "special")
res = timeSerPlot(MLEbins.nSeaFung.new,
legName = "(a) MLEbins",
yLim = c(-3.4, -2.1),
xLab = "Year",
method = "",
legPos = "bottomleft",
weightReg = TRUE,
xTicksSmallInc = 1,
yTicksSmallInc = 0.05)
The statistics for the regression fit (like those in Table S.1) are:
trendResultsMLEbinsNew = dplyr::tbl_df(res)
knitr::kable(dplyr::select(trendResultsMLEbinsNew, Method, Low, Trend, High, p, Rsquared),
digits=c(NA, 4, 4, 4, 2, 2))
Method | Low | Trend | High | p | Rsquared |
---|---|---|---|---|---|
0.0021 | 0.0115 | 0.021 | 0.02 | 0.18 |
For an equivalent to Table S.2 (results for each year for the MLEbins method), need the constant C for each year, so calculate it here:
MLEbins.res = MLEbins.nSeaFung.new
MLEbins.res = dplyr::mutate(MLEbins.res,
C = (b != -1 ) * (b+1) / ( xmax^(b+1) - xmin^(b+1) ) +
(b == -1) * 1 / ( log(xmax) - log(xmin) )
)
MLEbins.res = dplyr::select(MLEbins.res, -stdErr)
knitr::kable(dplyr::select(MLEbins.res, Year, xmin, xmax, n, confMin, b,
confMax, C),
digits=c(0, rep(2, 7)))
Year | xmin | xmax | n | confMin | b | confMax | C |
---|---|---|---|---|---|---|---|
1986 | 101.35 | 28722.51 | 916.52 | -2.43 | -2.34 | -2.26 | 664.97 |
1987 | 100.00 | 25974.17 | 849.17 | -2.58 | -2.48 | -2.38 | 1316.49 |
1988 | 101.61 | 29439.75 | 982.90 | -2.95 | -2.83 | -2.72 | 8642.27 |
1989 | 101.61 | 35210.99 | 1058.09 | -2.79 | -2.69 | -2.59 | 4175.41 |
1990 | 101.61 | 34811.19 | 830.41 | -2.77 | -2.66 | -2.55 | 3509.26 |
1991 | 100.00 | 26412.52 | 1076.15 | -2.93 | -2.82 | -2.72 | 8047.32 |
1992 | 100.00 | 25316.66 | 1101.53 | -2.93 | -2.82 | -2.71 | 7848.38 |
1993 | 100.00 | 29439.75 | 1121.89 | -2.56 | -2.47 | -2.38 | 1270.45 |
1994 | 101.61 | 22200.90 | 972.95 | -3.10 | -2.97 | -2.85 | 17969.56 |
1995 | 101.61 | 31818.64 | 936.68 | -2.78 | -2.67 | -2.57 | 3805.70 |
1996 | 101.61 | 36462.12 | 1117.99 | -2.91 | -2.80 | -2.70 | 7503.47 |
1997 | 100.00 | 19360.99 | 1188.82 | -2.83 | -2.73 | -2.63 | 5028.09 |
1998 | 101.61 | 22801.53 | 820.29 | -2.68 | -2.57 | -2.47 | 2258.31 |
1999 | 101.61 | 36462.12 | 892.03 | -2.62 | -2.51 | -2.42 | 1648.97 |
2000 | 101.35 | 22801.53 | 988.64 | -3.09 | -2.97 | -2.85 | 17382.96 |
2001 | 101.61 | 18824.87 | 1927.54 | -3.39 | -3.28 | -3.18 | 87714.71 |
2002 | 100.00 | 20464.76 | 1612.89 | -2.66 | -2.58 | -2.50 | 2272.10 |
2003 | 100.00 | 21611.31 | 1402.35 | -2.48 | -2.40 | -2.33 | 893.83 |
2004 | 101.61 | 30911.23 | 990.74 | -2.43 | -2.34 | -2.26 | 658.50 |
2005 | 100.00 | 19360.99 | 1294.30 | -2.57 | -2.49 | -2.41 | 1418.93 |
2006 | 100.00 | 21032.63 | 667.38 | -2.39 | -2.29 | -2.19 | 487.54 |
2007 | 100.00 | 18299.10 | 1150.19 | -2.99 | -2.88 | -2.77 | 10584.74 |
2008 | 100.00 | 28722.51 | 759.07 | -2.44 | -2.34 | -2.25 | 654.72 |
2009 | 100.00 | 24036.35 | 732.37 | -2.72 | -2.60 | -2.49 | 2542.18 |
2010 | 101.61 | 23293.68 | 1344.75 | -2.46 | -2.38 | -2.31 | 820.76 |
2011 | 101.35 | 26723.46 | 1742.68 | -2.69 | -2.61 | -2.54 | 2738.92 |
2012 | 100.00 | 14899.37 | 1177.94 | -2.46 | -2.38 | -2.30 | 789.30 |
2013 | 100.00 | 25974.17 | 1213.46 | -2.42 | -2.34 | -2.27 | 647.73 |
2014 | 100.00 | 30025.12 | 948.17 | -2.24 | -2.16 | -2.09 | 245.86 |
2015 | 100.00 | 18362.51 | 792.09 | -2.54 | -2.44 | -2.34 | 1088.61 |
Recommended plotting approach of results for each year, as in MEPS_IBTS_recommend.html.
First just get what’s needed for the calculations for the recommended plot:
dataRecommend.isd = dplyr::select(dataBin,
Year,
wmin,
wmax,
Number)
data.year.list = list() # to save results for each year
fullYears = sort(unique(dataBin$Year))
for(i in 1:length(fullYears))
{
data.year = dplyr::filter(dataRecommend.isd,
Year == fullYears[i])
data.year = dplyr::arrange(data.year,
desc(wmin))
sumNumber = sum(data.year$Number)
# Have to do not with dplyr:
wmin.vec = data.year$wmin
wmax.vec = data.year$wmax
num.vec = data.year$Number
countGTEwmin = rep(NA, length(num.vec)) # to do a manual count
lowCount = countGTEwmin
highCount = countGTEwmin
for(iii in 1:length(countGTEwmin))
{
countGTEwmin[iii] = sum( (wmin.vec >= wmin.vec[iii]) * num.vec)
lowCount[iii] = sum( (wmin.vec >= wmax.vec[iii]) * num.vec)
highCount[iii] = sum( (wmax.vec > wmin.vec[iii]) * num.vec)
}
data.year = cbind(data.year,
"countGTEwmin" = countGTEwmin,
"lowCount" = lowCount,
"highCount" = highCount)
data.year = dplyr::tbl_df(data.year)
data.year.list[[i]] = data.year
}
Here is the code to give an animation for the equivalent Figures 7 and S.5-S.34 for the IBTS data (now with minimum body size set to 100 g). For each year (given in top-right corner) the plot shows the individual size distribution and MLEbins fit (red solid curve) with 95% confidence intervals (red dashed curves, may be hard to see). For each bin, the horizontal green line shows the range of body sizes, with value on the y-axis corresponding to the total number of individuals in bins whose minima are \(\geq\) the bin’s minimum. For each bin, the vertical span of the grey rectangle shows the possible range of the number of individuals with body mass \(\geq\) the body mass of individuals in that bin (horizontal span is the same as for the green lines). The text in (a) gives the year, the MLE for the size-spectrum exponent \(b\), and the sample size \(n\).
Here is the code to build the movie, but it is commented out since it causes Travis to fail – Travis is the continuous integration service that automatically checks the packages builds every time a change is committed to GitHub, giving the little green symbol passing
icon on the main page. The trick to showing the movie is to run the code uncommented (you need to install the gifski
package), right-click on the animation in the html viewer and save it. I’ve done that and so am leaving this code commented, and will just refer to the saved animation file.
# ```
# {r, animation.hook = 'gifski', interval = 1.5, fig.width = 5.36, fig.height = 8}
## fig.width is 0.67 * fig.height (which is 8)
#
#for(i in 1:length(fullYears))
# {
# ISD_bin_plot(data.year = data.year.list[[i]],
# b.MLE = dplyr::filter(MLEbins.res, Year == fullYears[i])$b,
# b.confMin = dplyr::filter(MLEbins.res, Year ==
# fullYears[i])$confMin,
# b.confMax = dplyr::filter(MLEbins.res, Year ==
# fullYears[i])$confMax,
# year = fullYears[i],
# xlim = xlim.global,
# xmin = dplyr::filter(MLEbins.res, Year ==
# fullYears[i])$xmin,
# xmax = dplyr::filter(MLEbins.res, Year ==
# fullYears[i])$xmax,
# yBig.inc = 500,
# ySmall.inc = 100)
# }
# ```
The resulting animation is
.
To save each year as it’s own figure do this (not it running here; you have to make the IBTS-min-100
directory first):
for(i in 1:length(fullYears))
{
postscript(paste0("../IBTS-min-100/IBTS-ISD", fullYears[i], ".eps"),
height = 8, width = 5.36,
horizontal=FALSE, paper="special")
ISD_bin_plot(data.year = data.year.list[[i]],
b.MLE = dplyr::filter(MLEbins.res, Year == fullYears[i])$b,
b.confMin = dplyr::filter(MLEbins.res, Year ==
fullYears[i])$confMin,
b.confMax = dplyr::filter(MLEbins.res, Year ==
fullYears[i])$confMax,
year = fullYears[i],
xlim = xlim.global,
xmin = dplyr::filter(MLEbins.res, Year ==
fullYears[i])$xmin,
xmax = dplyr::filter(MLEbins.res, Year ==
fullYears[i])$xmax,
yBig.inc = 500,
ySmall.inc = 100)
dev.off()
}
See MEPS_IBTS_recommend.html for inserting all the figures into a LaTeX document using a loop.